One sketch for all: Theory and Application of Conditional Random Sampling
نویسندگان
چکیده
Abstract Conditional Random Sampling (CRS) was originally proposed for efficiently computing pairwise (l2, l1) distances, in static, large-scale, and sparse data. This study modifies the original CRS and extends CRS to handle dynamic or streaming data, which much better reflect the real-world situation than assuming static data. Compared with many other sketching algorithms for dimension reductions such as stable random projections, CRS exhibits a significant advantage in that it is “one-sketch-for-all.” In particular, we demonstrate the effectiveness of CRS in efficiently computing the Hamming norm, the Hamming distance, the lp distance, and the χ distance. A generic estimator and an approximate variance formula are also provided, for approximating any type of distances. We recommend CRS as a promising tool for building highly scalable systems, in machine learning, data mining, recommender systems, and information retrieval.
منابع مشابه
Conditional Random Sampling: A Sketch-based Sampling Technique for Sparse Data
Abstract We1 develop Conditional Random Sampling (CRS), a technique particularly suitable for sparse data. In large-scale applications, the data are often highly sparse. CRS combines sketching and sampling in that it converts sketches of the data into conditional random samples online in the estimation stage, with the sample size determined retrospectively. This paper focuses on approximating p...
متن کاملمقایسه مدلهای لجستیک حاشیهای با اندازهگیری مکرر و لجستیک شرطی در بررسی عوامل موثر بر پرفشاری خون
Background and purpose: To analyze the data in which the correlation between observations are to be considered, a general method is using marginal model with repeated measures, yet there is another method called conditional model with random clusters. Âccording to the binary responses, the aim of the present study is to compare the efficiency of these two models in studying the risk factors a...
متن کاملApplication of Sequential Gaussian Conditional Simulation to Underground Mine Design Under Grade Uncertainty
In mining projects, all uncertainties associated with a project must be considered to determine the feasibility study. Grade uncertainty is one of the major components of technical uncertainty that affects the variability of the project. Geostatistical simulation, as a reliable approach, is the most widely used method to quantify risk analysis to overcome the drawbacks of the estimation methods...
متن کاملEfficient Simulation of a Random Knockout Tournament
We consider the problem of using simulation to efficiently estimate the win probabilities for participants in a general random knockout tournament. Both of our proposed estimators, one based on the notion of “observed survivals” and the other based on conditional expectation and post-stratification, are highly effective in terms of variance reduction when compared to the raw simulation estimato...
متن کاملApplication of the theory of reasoned action to promoting breakfast consumption
Background: Breakfast is the most important daily meal, but neglected more than other meals by children and adolescents. The aim of this study was to evaluate the effectiveness of an educational intervention, based on the Theory of Reasoned Action (TRA) to increase breakfast consumption among school children in Bandar Abbas, Iran. Methods: In this quasi experimental study which was conducted...
متن کامل